Low Complexity Spectral Imputation for Noise Robust Speech Recognition

نویسنده

  • Julien van Hout
چکیده

of the Thesis Low Complexity Spectral Imputation for Noise Robust Speech Recognition by Julien van Hout Master of Science in Electrical Engineering University of California, Los Angeles, 2012 Professor Abeer Alwan, Chair With the recent push of Automatic Speech Recognition (ASR) capabilities to mobile devices, the user’s voice is now recorded in environments with a potentially high level of background noise. To reduce the sensitivity of ASR performance to these distortions, techniques have been proposed that preprocess the speech waveforms to remove noise effects while preserving discriminative speech information. At the expense of increased complexity, recent algorithms have significantly improved recognition accuracy but remain far from human performance in highly noisy environments. With a concern for both complexity and performance, this thesis investigated ways to reduce the corruptive effect of noise by directly weighting the powerspectrum (SMF pow ) or log-spectrum (SMF log ) of speech by a mask whose values are within [0,1] and are indexed on the local relative prominence of speech and noise energy. Additional contributions include a low-complexity approach to mask estimation and the use of spectral flooring for matching the dynamic range of clean and noisy spectra. These two techniques are evaluated on two standard noisy ASR databases: the Aurora-2 connected digits recognition task with 11 ii words, and the Aurora-4 continuous speech recognition task with 5000 words. On the Aurora-2 task, the SMF log algorithm leads to state-of-the-art performance, with a limited complexity compared to existing techniques. The SMF

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Missing Feature Imputation of Log-spectral Data for Noise Robust Asr

In this paper, we present a missing feature (MF) imputation algorithm for log-spectral data with applications to noise robust ASR. Drawing from previous work [1], we adapt the previously proposed spectrographic reconstruction solution to the liftered log-spectral domain by introducing log-spectral flooring (LS-FLR). LS-FLR is shown to be an efficient and effective noise robust feature extractio...

متن کامل

State based imputation of missing data for robust speech recognition and speech enhancement

Within the context of continuous-density HMM speech recognition in noise, we report on imputation of missing time-frequency regions using emission state probability distributions. Spectral subtraction and local signal–to– noise estimation based criteria are used to separate the present from the missing components. We consider two approaches to the problem of classification with missing data: ma...

متن کامل

Mask estimation in non-stationary noise environments for missing feature based robust speech recognition

In missing feature based automatic speech recognition (ASR), the role of the spectro-temporal mask in providing an accurate description of the relationship between target speech and environmental noise is critical for minimizing the degradation in ASR word accuracy (WAC) as the signal-to-noise ratio (SNR) decreases. This paper demonstrates the importance of accurate characterization of instanta...

متن کامل

Robust automatic speech recognition with missing and unreliable acoustic data

Human speech perception is robust in the face of a wide variety of distortions, both experimentally applied and naturally-occurring. In these conditions, state-of-the-art automatic speech recognition technology fails. This paper describes an approach to robust ASR which acknowledges the fact that some spectro-temporal regions will be dominated by noise. For the purposes of recognition, these re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012